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ABSTRACT 



Cheraical Abstracts Service (CAS), in conjunction with the National 
Science Foundation, conducted the first public demonstration of CAS 
computer-based substructure search techniques at the 152 nd Meeting of the 
American Chemical Society in Kew York City. Prom September 11 through 
September 16, 1966, interested persons were given the opportunity to see 
substructure search operations and to determine the techniques capabilities 
end potentialities in light of their own needs. 

The purpose of substructure searching is to enable technical per- 
sonnel to automatically search for chemical structures and substructures 
that have been reported in the literature and registered in the CAS 
Chemical Compound Registry System. The New York City demonstration used 
a "breadboard model" of the Substructure Search System,' i.e. a version 
capable of producing all of the correct ans;rers to the questions, but a 
model without the refinements that .rill be available in an operational 
system. Because of the dialogue between CAS scientists and practicing 
chemists and chemical engineers, CAS is now able to make significant 

technical improvements to better serve the informational needs of the 
chemical community. 

This report describes the demonstration, the breadboard model, and the 

results of the demonstration, as well as the improvements suggested as a 
result of* the e:q)eriment. 
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DEMONSTRATION OBJECTIVES 

The Hew York demonstration was an experiment. It was designed to 
determine, under conditions approaching those that could be expected to pre 
vail for an operational system, the adequacy and efficiency of the substruc 
ture search techniques and their reception among practicing chemists, chemi- 
cal engineers, and others who require chemical information. Jtaong the many 
specific objectives of the demonstration were the following: 

1. To acquaint the technical public with a computer-based technique 
that would rapidly recall and collate chemical data based on 
chemical structures, and to allow the CAS staff to gain valuable 
experience in such areas as question framing and coding, dialogue 
with users, and remote-location operations# 

2. To determine the types of questions that would be asked and, in 

general, to determine what the practicing chemist and chemical 
engineer wanted the system to do for him. 

3. To assess existing techniques for such procedures as screening, 
coding, and remote searching, and to collect additional design 
data that might lead to their improvement. 

k. To acquire actual operating data such as machine times and answers 
per question. 

The success of the demonstration in meeting the goals outlined above 
is summarized in the following sections. A glossary of terms used in 
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sutstruoture searching appears in Appendix A, while detailed statistical 
data on the questions asked and answers retrieved are provided in Appendix 
B. Appendix C gives detailed information about the screens used. Appen- 
dix D gives characteristics of the Demonstration File, Appendix E Hsts the 
questioners and their organizational affiliations, and Appendix F provides 
examples of the questions asked and hits retrieved. 
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THE CAS SUBSTRUCTURE SEARCH SYSTET4 ' 

The Substructure Search System is being developed as part of the over- 
all CAS computer-based chemical compound-handling system. It is being 
designed to locate within a file of structures in connection table form all 
compounds that possess one or more specified substructures. 

Essential to the concept of the Substructure Search System is the pro- 
vision of maximum flexibility in both question and answer specificity. To 
provide the desired flexibility, the search technique being designed at CAS 
-operates at several levels of specificity. At one level, chemical fragment 
screens, many of which correspond to functional groups with which every 
chemist is familiar, are used to select from the whole file those compounds 
that include potential answers to the question. Such screening is a very 
rapid and relatively inexpensive way to select compounds from a file. De- 
pending upon such things as the size of the list of answers, the relationship 
between the sought-after substructure and the retrieved structure, and the 
cost of the search, this level of search may provide answers that are quite 
satisfactory to the questioner. Nevertheless, if greater specificity is 
desired, an iterative, atom-by-atom, bond-by-bond search level is available 
which can reduce the list of candidate structures to include only those that 
meet the more exact specifications. In no case will the search system elimi- 
nate exact answers to a question— rather, the "non-answers" are rejected. 

At each level of specificity, the user will have the option to either termi- 
nate or continue the search, based upon the results of the 



previous step. 



one 
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Clearly , screening is a critical phase of substructure searching, 
that determines the efficiency of the search and hence the cost of search- 
ing. Most screens are produced by relatively inexpensive screen generation 
programs that automatically strip the various fragment types (atom counts, 

ring counts, etc.) from the computer structural record. Approximately 1500 
such fragments are used for screening purposes. 

Screens are represented in the computer file by "bit indicators." 
Associated with each compound in the file is a series of these indicators 
(binary digits), each of which corresponds to a specific screen item. Each 
bit acts like a switch: if the compound possesses the screen item corre- 
sponding to a particular bit, the bit is set to "on." If the compound does 
not possess that item, the bit remains off. Once such a record has been 
established for each compound in the file, screening can be accomplished 
for a substructure search question merely by setting up a bit-indicator 
record for the question showing the screen items to be located. The record 
for the question is then compared with the records for the compounds on file 
in a quick and easily accomplished computer procedure. It should be noted 
that once the indicators are set for a file of compounds it is not necessary 

that they remain static. These screen assignments can be altered to fit a 
given operating environment. 

The Substructure Search System incorporates Boolean logic, and questions 
may be posed in tenns of "and", "or", and "not" logic. "And" logic requires 
presence of an atom or group of atoms in the answer. "Not" logic speci- 
fies that an atom or group of atoms must not be present in the answer. "Or" 
allows alternatives, one of which must occur in every retrieved structure. 
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A fourth listing, ’’Don't Care", allows atoms and bonds within the substruc- 
ture to be left unspecified. 

Figure 1 shows a typical substructure search question and illustrates 
how answers are dependent upon question specificity. The question allows 
the three bonds marked by arrows to appear in either a ring or a chain. 



The first answer shows the substructure imbedded within two rings. The 
second answer has no rings, while the third is a ring-chain combination. 



in the indicated positions, only the first answer would have been satis- 
factory. Had they been limited to chain bonds, only the second answer 
would have been obtained. In neither of these last two possibilities would 
the third structure have been retrieved. 

The CAS Substructure Search System is experimental, and certainly not 
all of its potential uses have even been recognized. Therefore, it is 
expected that many more than the four applications outlined below will be 



TYPICAL SUBSTRUCTURE SEARCH QUESTION 



F3C — C=C— C— N 



t_Li 



Ring or Chain 



ANSWERS: 







NH(CH2)3CH3 




CHj CH3 



OH 



F3C — C=C--C— N 




FIGURE I 



Had the "don't care" bonds of the question all been limited to ring bonds 
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found for the system as it matures and as potential users become more 
acquainted with it* 

1. The general use to which the system will be put is that of 
substructure search. That is, searches of a file of structiires 
to locate those that contain similar structural characteristics. 
Such searches are by no means limited to the CAS con^uter, they 
could be conducted by other institutions or organizations. 

2. Since confounds containing specified substructures can be iden- 
tified during the registration process, this system can provide 
an alerting service for new compounds containing substructures 
of interest to any given user. Moreover, since all ring systems 
indexed in the subject index to Chemical Abstracts are registered, 
any new ring system entering the system can automatically be iden- 
tified, even when it is embedded in another structure. 

3. The system provides the mechanism for automatically generating 
fragmentation codes for updating a user’s fragment search file 
whether it be computerized or manual. By interrelating the frag- 
ments of a manual system and the screens of an associated con 5 >uter 
search system, the latter can be used efficiently to supply more 
specific answers than are obtainable by a manual search. 

h. If a substructural hierarchy is established— which may be varied 
at each use— for printing out a list of answers, the system can 
be used to organize a series of structures without depending 
yxgon systematic nomenclature or human intervention. In addition, 
if structures are available directly from the 




computer, it is 



I 

. possible to pose whole -question or substructure questions 
directly to the system in diagrammatic language and receive an 
organized list of answers in the same form. This work has 
already been accomplished for small systems by several groups 
and is now under development for large systems by CAS. 
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DEMONSTRATION DESCRIPTION 



The substructure search techniques demonstrated at Mew York City were 
perfomed using a "breadboard" model of the operational system. That is, 
the components used for the demonstration were not specifically designed to 
be interlinked, and although the demonstrated system was fully capable of 
selecting all of the answers from the files for a substructure search ques- 
tion, it did not possess the operational sophistication required of a heavily 
used system-it did not perfom many of its functions in an efficient man- 
ner. Moreover, some of the tasks that wiU eventually be perfomed, partly 
or entirely, by computer were assigned to humans for the demonstration, and 
some of the options that will eventually be offered routinely were available 



only by dividing questions into two or more parts. Finally, the demonstrated 
system 'was programmed for the im 7010 computer, whereas the first opera- 
tional system will employ the IBM 360 computer. Nevertheless, the system 
was fully capable of its basic task- -computer searching for defined sub- 
structures in a file of con^iound-structure representations. 

To limit the amount of time and money spent to search each question, 
>*ile at the same time providing representative search results, CAS set up 
a special demonstration file of 55,396 compounds; subtly more than one- 
tenth of the number of compounds registered as of September 1966. (See 
Appendix B for a description of this file. ) In addition, the number of 
hits provided to any one question was arbitrarily limited to six. 

A generalized flow diagram of the Substructure Search System is shown 
in Fig. 2 , while Fig. 3 details the "System" as it. operated specifically 
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for the Kev York City demonstration. The process started vith a face^o- 
face interview between a questioner and a CAS chemist to deteimine the 
exact details and precise meaning of a question and the objectives of the 
search. Once these were discerned, the substructure was drawn and the 

on keyboarded on a paper-tape-generating structure typewriter located 
in New York. The data contained on the paper tape generated by the type- 
writer were then transmitted by TWX to CAS headquarters in Columbus, Ohio 
where a hard copy was produced by a similar typewriter. 

in Columbus, the screens were coded manually by a CAS chemist. The 
coded substructure search questions were then matched against the Search 
. Screen Pile-a file which included only the Bit Indicator Screens for each 
compound on the Search File and the corresponding Registry Numbers. This 
screening process produced a set of Registry Numbers as candidate co^ounds. 
At this point, some of the questions were completely answered because the 
screening process determined that no exact answers existed on file or be- 
cause screening completely identified the exact answers. For the other ques. 
tions, the corresponding sets of candidates included not only those co„d. 
that exactly answer the search question, but also some related coi^ounds. 

latter sets, an iterative search, atom-by-atom and bond-by-bond, 
was made on the candidate compounds in the Structure File to select the 
exact answers, referred to as "hits", to the search question. 

At this point, the-^tructure of the cos^ound, the molecular formula, 
and the CA index name were revie.;ed by a chemist to insure that the results 
were valid. Errors in coding were then cycled for recoding and re-search. 



The validated structure and bibliographic data were then typed on the 
structure typewiter and transmitted to New York where hard c<^ies of the 
information were produced on the structure typewriter located there. The 
answers were then sorted and the abstract for one bibliographic citation 
retrieved from CA on microfiOm included with the printed answers. These 
and a copy of the search question were later returned to the questioner. 
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DEMONSTRATION RESULTS 



Substructure searching has been in the development stage for several 

years, but until the time of the Nev York City demonstration, the capability 

had never been shown publicly.* cAS believed that if the ^erational system 

was to accomplish its goal-to fill a major need in the chemical researcher's 

xnformation re^uirements-the existing system required public e:q,osure. The 

Kew York meeting gave us such an opportunity. Throng , 3pecial exhibit set 

bp at the ACS meeting, some 750 people were introduced to the search technique 

These peqple were provided with literature on substructure searching and had 

, the opportunity to discuss the system with CAS staff and to test the system 
by sxjpplying questions to it* 

Some 163 perfvjns representing approximately 110 organizations-univer- 
sities, industrial firms, governmental agencies, and research institutes- 
availed themselves of the opportunity to ask questions, and I83 searches V 
were run during the four-day demonstration. About half of the questioners 
were research chemists, while the other half were chemical information spe- 
cialists. Appendix E lists the questioners and their affiliations. 

To provide e:q,erience to its staff, CAS assigned eleven chemists and 
Six systems personnel to conduct the New York demonstration. Five chemists 
and three systems personnel were located in New York, the remainder in 

umbus. This staff was aided by a chemical-typewriter operator at each 
location as well as keypunch operators in Columbus. 






representatives in 



Since this demonstration was to be our first experience in handling a 
large and widely diversified number of substructure search questions, each 
professional involved underwent approximately 20 hours of training prior to 
the meeting* Items such as the following were discussed to familiarize 
those involved with the skills they would need: 

a. Interaction with questioners. 

b. Problems of question definition. 

c. Problems of communications between New York and Columbus 

d. Coding for screens and iterative search. 

The Pattern of Questions and the Requested System Capabilities 

The New York demonstration gave CAS an opportunity to gather informa- 
tion as to the types of queries that could be expected to be asked of an opera 
tional system and to identify specific system characteristics desired by users 

Discussions between CAS personnel and visitors to the demonstration 
made it cle^ that any operational system must be flexible enou^ to serve 
the spectrum of users, from the single researcher working at a university 
to the research section of a large industrial fim. Question profiles, 
search files, answer specificity, and output formats each pose special 
problems that will vary according to the environment in which the system 
is used. It is also clear from the demonstration that the system must be 
capable of performing searches for both full structures and substructures, 
retrospectively and on a current -awareness basis. 

Even though it will be necessary that the system be customized in terms 
of such ioems as the types of acceptable questions and the format and detaiil 
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of output, individual queries will have to he exactingly defined. Each bond, 
each element, each alternative must be precisely identified-even if it is 
don t care if the user is to obtain the response he requires.. In Nevf 
York, CAS personnel questioned the user extensively to obtain this infor- 
mation. Generally, we found that, although his questions were very speci- 
fic, they were imprecisely worded, and it required considerable time to de- 
fine the inquiry with sufficient detail to insure that the questioner would 
receive the answers he desired. Such personal interrogation will not ordi- 
narily be available in a hi^ly automated system. Instead of face-to-face 
interrogation, it is expected that, in an operational system, the computer 
will ask the pertinent questions that will lead to fully defined structural 
questions. Computer-user dialog will help both the novice and the experi- 
enced user to obtain satisfactory results from the system with a of 

effort. 

Registry Numbers .alone will be of little value in most applications. 

As a imnimum, users will have to be provided with Desktop Analysis Tools or 
di.rsct computer output that links the Registry lumbers with names that can 
be searched for in printed indexes and/or stioictural formulas. A range of 
output options must be provided; the user will want bibliographic citations, 
titles, structures, and/or other printed information to help him determine 
the references that contain relevant information. Perhaps hard copies of 
abstracts or even the actual articles may be included as part of the pack- 
age. The choices that will be ultimately available to the consumer are 

heavily dependent upon other systems and services being developed at CAS 
and elsewhere. 
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In talking to individuals, CAS staff received several requests that 
pointed to capabilities in the overall structure-handling system that needed 

Tli©s© inclucls Td©1owj 

(1) the capability to integrate full-structure, substructure, and 
nomenclature searches without the need for the user to maV c a 
distinction between the various systems e 

(2) the capability of searching for compounds containing specified 
isotopes. 

(3) the capability of allowing the user to stipulate that certain 
compounds containing the sought-after substructure will be ex- 
cluded from the answers. 

(^) the capability of searching for substructiires that contain a 
repeating groiQ) (polymeric or not) attached to a specified 
group at each end, without specifying the number of repetitions. 

(5) the capability of searching for structural information on poly- 
mers and coordination compounds. 

(6) the general capability of making correlative searches (with 
appropriate logic) that utilize both text materials and structurail 
information, whether from the Sam?? file or from interrelated files. 

All of the above suggestions are based on specific qUw»stions for which 
the above capabilities could have been utilized. For example, some ques- 
tioners were interested in con5)ounds possessing a specific substructure, 
but they did not want to recall the con5)ounds that they already knew con- 
tained that substinicture. Although these results can be achieved by manual 
screening of the answers, CAS believes that in the interest of economy and 
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accuracy, the ability to exclude predetermined information should be in- 
eluded in the system* 



Screeni ng^ Coding, and Remote -Searching Techniques 

Because of its importance to the total system, the screening program 

of the Substructure Search System has received continuing attention and its 
efficiency has steadily been improved* 

An in5)ortant objective of the New York demonstration was to determine 
screening efficacy as a function of the questions asked* To make such an 
evaluation, the concept of "percent screenout" is. used and defined as:* 



No* of Compds* Eliminated b y Screening 
Total Con]pds* in File ^ xlOO 

Applying this criterion, we find that of the I83 questions asked, 

(123) questions were screened with at least 9% efficiency* That is, or 
less of the compounds on file for the demonstration passed the screens* 
Thirty-six, or 20^ of the questions were screened with 95-995^ effectiveness, 

with 90-9lf^ effectiveness, and S.% were screened with less than 90^ 
effectiveness* 



useful on the assumption that the number of answers to 

to+fli a very small percentage of the 

... ^ these circumstances, a screenout percentage near 100^ 

f effective screening* However, for a question in which the nu4- 
ber of answers is a significant percentage of the file size, the percent 

iterative screens operate perfectly such that no 

nn IS required* For example, one question asked for duri 

1^^,806 answers, all of which were found by screening 
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Of the questions with less than screenout, several had more than 
550* answers (1^ of the file), and therefore could not have realized ^ 
screenout. Nevertheless, other questions for which large numbers of com- 
pounds passed the screens highlighted the need for some additional screens. 
Several that are to he added to the system are: 

(1) screens for carhocyclic and heterocycHc rings of specified sizes 

(2) generic level screens for a carhocyclic ring of any size and for 
a heterocyclic ring of any size; 

(3) a carefully selected groi^ of screens for complex ring systems 

such as those illustrated hy anthracene, phenanthrene, and 
benzincLehe, etc.; 

(4) additional chemically significant fragments, including some for 

atom chains of varying lengths (e.g., 5, or 6 atoms); 

(5) addition of some generic-level, chemically significant fragments 
which would simply show connectivity relationships without 
specifying particular atoms. 

At present, most of the screens used in substructure searching are 
structurally specific-they require the presence of specific atoms, specific 
bonds, etc. -for all potential answers to the substructure search questions 
(see Appendix C for screen descriptions). Consequently, the less specific 
the search questions (e.g., the greater the number of "don't care" atoms 
or bonds), the less effective are the screens. A few generic level screens 
were used for the New York demonstration, but our e:q,erienoe there taught 
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US that a substantially greater number are required for effective screening. 
The screens developed to fulfill the needs described in Nos. 4 and 5 above 
are described in the next section of this report. 

Another technique that was evaluated in light of the New York demon- 
stration was the coding of substructure search questions. Our experience 
at the demonstration bore out our previous feeling that the human encoding 
of questions nmt required is too inefficient to be used in an operating 
system. The human requirements for coding are too extensive to be per- 
formed for any system subject to heavy use. In addition, manual coding is 
too complex to handle without extensive training. Although the need for 
some limited amotmt of manual coding of questions, for both screening and 
iterative search, may always exist, it must be simplified. However, it is 
expected that in the operational system, the computer will be the major in- 
strument used to code questions for both screening and iterative searching. 
The coding procedure will probably be started as the user types the struc- 
ture on a chemical typewriter or possibly on an on-line device sudi as the 
IBM 2250 (essentially a chemical typewriter incorporating a cathode -ray 
tube with a li^t pen for real-time playback). Through a translation pro- 
gram such as now used for the Registry System, the infoimation will be coded 
to a connection table from which the screens will be generated. Through 
another translation program, the information will be coded in the form 
needed for the iterative search, maroughout this process, a computer- 
directed dialogue with the user will help him to frame his question with 

appropriate precision to maximize his ability to gain useful answers from 
the system. 
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A related question investigated in light of the Hew York city demon- 
stration involved the techniques of remote searching and the handling of 
structures on the structure -typing typevn-iter.* m general, both p«cedures 
were entirely satisfacto^, the structure typewriter proving a useful tool, 
and the remote te«inal setup operating satisfactorily. Although about a 
dozen questions were either garbled in transmission or were typed at the 

remote end with insufficient information; this caused only minor problems 
that were solved with a telephone 
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SYSTEM ENHANCEMENT BASED UPON DEMONSTRATION RESULTS 

0 

One of the major reasons for the New York demonstration was to give 
CAS an (^portunity to detect areas within the system that needed further 
investigation. Because the substructure searching methodology was sub- 
jected to a critical review by the type of individuals that would use an 
operational system, we were able to detect the areas that needed strength- 
ening* Two such areas are discussed helow* 

Additional Screen Caoahility 

It has been previously stated that one of the more important lessons 
learned from the New York demonstration was that it pointed to the need 
for additional screen types, including some intermediate generic screening 
capabilities. Concerning the latter, most of the approximately I500 screens 
used for the demonstration were either too specific or so generic so as to 
reduce the screen efficiency below acceptable limits for certain questions. 
Had certain screens of intermediate generic nature been available to rapidly 
separate the candidate structures from the total file, less iterative search- 
ing would have been required. In an operating system, this would result in 
a less expensive operation since, as would be expected, screening is much 
less time consuming than iterative searching. 

As a result of the demonstration, two new types of screens are being 
instituted in the Substructure Search System. These are (l) the Degree of 
Connectivity Screen and' (2) the Linear Sequence Screen. 



In addition, some 
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intermediate generic levels are being introduced into the "Triplet" and 
*^oiety" screens. 



j-* Degree of Connectivity Screen 

This screen type is defined as the minimum number of atoms having N or 
more nonhydrogen attachments, where N can equal 3, 5, 6. For example, 

the structure illustrated below would satisfy the substructure search require 
ment of possessing one or more atoms with a degree of connectivity of four, 
because Atom No. 1 has four nonhydrogen atoms attached to it (Nos. 2, 3, if-, 
and 5)- 




The structure would also satisfy the requirements for three or more atoms 
with a degree of connectivity of three, since Atom Nos. 1, 2, and 3 each 
have at least three nonhydrogen atoms attached. 

The advantage of this screen type is that it enables one to utilize 
the discriminatory power of atoms having degrees of connectivity of 3 or 
greater even when all of the atoms in the substructure search request are 
not identified specifically. For example, the substructure search request 
illustrated below has one atom with a degree of connectivity of If- and also 
has three atoms with a degree of connectivity of 3 or more (the atom with a 
degree of connectivity of h is also counted as an atom with a degree of 



- - 



connectivity of 3 or more). The use of these screens in conjunction with 
others that are now available will help reduce the number of stmctures 
that will have to be searched atom-by»atora. 
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2. Linear Sequence Screen 

The Linear Sequence Screen is defined as a series of k, 5, or 6 specific 
atoms and the bonds uniting them. The only bond specificity is whether they 
are chain or ring bonds. For example^ among the linear sequences present in 
the three structures shown below, the most discriminatory ones are indicated 



beneath each structure (a ring bond is designated by an asterisk and a chain 
bond by a hyphen). 



V 




CI-C*C-CI 




Such discriminating power was not possible with our previous screens. Con- 



sequently, in a structure search request for ortho dichlorobenzene all three 
of the above structures would have passed the screens and would have to be 
iteratively searched. With the Linear Sequence Screen described above, two 
of the structures would be screened out, thereby reducing the total amount 
of iterative search time required to retrieve the desired structures. 
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Computer Edit ing of Search Questions 

The e:5,erimental substructure search "system" demonstrated in New York 
incorporated a number of conputer editing routines to check the validity of 
information coded in the screens for each question. Among these were checks 
for keyboarding errors and checks to substantiate that coded screens were 
available for use. However, since coiapletely computerized editing routines 
had not as yet been provided, a substantial amount of manual editing had to 
be done-far more than would be tolerable in an operational system. To reduc* 
the amount of manual effort required for this purpose, . appropriate co^^uter 
editing routines will be written to check the validity of information coded 
for search. Examples of the type of editing checks to be provided are: 

Checks for Allowable Ch a racters in a Given 

In the search coding operations, the type of character aUowed in cer- 
tain columns is restricted. For example, only numeric characters are allowed 
in the columns of bond values and valences. i„ the operational system, if 
an alphabetic character is mispunched in one of these columns, the informa- 
tion will be rejected and appropriate diagnostics describing the reason 

be produced. In other instances, only certain characters may be entered, 
yor example, in the columns reserved for Boolean logic operators, only the 
letters A, 0, or N (for "A®", "OR", and "NOT" logic, respectively) are 
allowed. If an invalid character is used, the information will be rejected 
and an appropriate diagnostic will be produced. 



~S6 ~ 




The type of data aUowed in certain fields is also restricted. For 
example, several tTO-colnmn fields will accept only the symtols for the 
elements or certain numerical values assigned to atoms that have teen pre- 
viously cited elsewhere in the iterative search question. If invalid data 
(e.g., an invalid elanent symbol, or an invalid number) appears in one of 

these fields, the infonnation wiU be rejected and an appropriate diagnostic 
will be produced. 
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STATISTICAL SUMMARY AND COSTS 

This section of the report presents a svwmary of statistics, including 
costs, that were derived from the Hew York demonstration. More detailed sta- 
tistics concerning screening and iterative search can be found in Appendix B. 

Throughout this section and the succeeding appendixes, the term "hit" 
is used. This term is defined to mean a structure retrieved by a search 
that satisfies the search question. The term is used to contrast 

answers that identify structures (hits), and the situation in which a search 
produces no Registry Numbers of structures that satisfy the question be- 
cause none exist in the file. Although this latter circumstance is not de- 
fined as a hit, it nevertheless is a valuable piece of information. 

It should also be recalled that because of time and cost limitations, 
an arbitrary limit of six hits per search was imposed. That is, once six 
hits were retrieved for a question during a search, the search was terminated. 
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TABLE I 

question/hit statistics 

1. Number of compounds on file to be searched 

2. Number of persons asking questions 
3* Number of questions asked 

4. Number of negative (i.e., no-hit) searches 
5* Number. of searches with hits 
6. Number of hits (max. of 6/search) 

7* Projected number of hits^^^ 

8. Range of number of projected hits per 
search (based on 102 searches) 

a. Maximum 

b. Minimum 



55.396 

163 

183 

81 

102 

529 

35.7i^5 



ll^,806 

1 



^ Projected figures estimate number of hits if the limit of 6 hits per 
question were not imposed. ^ 

(2) 

Because of the 6-hit limits not all confounds passing screens were 
searched atom-by-atora. Instead, atom-by-atom search continued onlv 
until 6 hits resxilted. 
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TABLE II 
COMPUTER TIMES 



( 1 ) 



Actual Projected 

( lirQit-6 hits/question ) (no limit) 



!• Total Searching Times 



a. Screening 


6.96 hrs. 


6.96 hrs* 


b. Iterative search 


22.kh 


47.^0 


c. Total search time 


19.^0 


3h.3S 


Average Searching Times 






a. Screening time per question 






(based on I83 questions) 


0,038 


0.038 


b. Iterative search time per question 


0.068 


0.259 


c. Search time per question 


0.106 


0.297 


d. Search time per hit 


0.037^^^ 


0.0016^^^ 



Only iterative search times are affected by the limit of six hits 
per question since screening is performed on the entire file while 
iterative search was terminated after six hits were retrieved. 

(3)Based on the 529 hits actually retrieved. 

Based on a projected number of hits of 35^745. 



Demonstration Costs 



The foUowing table compares the search costs incurred during the 
Hew York demonstration to those incurred during the demonstration held at 
CAS in November, 1965. This table should be used carefully since there 
were differences betireen the two demonstrations that have an affect upon 
the results. These differences are: 

1. AJthough the search file was identical for both demonstrations, 
the screens used were not. The screens used for the earlier 

demonstrations were not all a part of the bit indicator record. 
Therefore, screening was much less efficient, 

2. There was a limit of I5 hits per search question placed on itera- 
tive (atom-by-atom, bond-by-bond) search during the earlier demon- 
stration, however, only one search was affected. A limit of six 
hits was placed upon the number of hits per search during the New 
York demonstration, and several questions were affected by the limit 

3. During the earlier demonstration, 25 questions were asked. Of these 
20 were iteratively searched. 

The earlier demonstration used an IBM UlO computer while the I966 
demonstration used an IBM 7010 computer. 

Besides being able to compare this demonstration with the one that took 
place earlier, we are able to make some limited judgements as to costs that 
will be incurred when the 360 Substructure Search System becoraes operational. 
Based upon preliminary cost estimates, a reduction of some 6O5S in search 
costs is anticipated at that time. This is, in part, due to the increased 
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speed of the computer and, in part, due to the shift from a breadboard to 
a designed search system. Based upon the I83 questions and the 539 hits 
developed at the demonstration, the cost per question for the c^erational 
system vill be approximately $15*50 and the cost per answer, $.08 as com- 
pared to $38.61 per question and $0.21 per answer for the demonstration, 
based upon the projected figures. Neither set of figures includes the cost 
of generating the screen file since the allocation of such costs is depen- 
dent upon the size and character of the search file and the number of ques- 
tions to be run against the file during its length of useful life. 
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GLOSSARY 



Indicator Screen — »A screen in which a series of binary digits (bits) 

are each assigned a yes -no relationship for the presence of a given struc- 
tural feature in a given compound. 

C hfflical Fraqaent -A weU-defined grouping of atoms and bonds thought of 

as an entity, and from which bonding to other elements may or may not be 
well defined. 

Co nnection Table- A computer-based linear notation consisting of atom-by- 
atom, bond-by-bond inventory that shows each atom, the atoms connected di- 
rectly to it, and the types of linking bonds. Mass number, coordination 
number, valence, and charges are shown whenever they are required for exact 
Identification. (Stereochemical data are included but were not machine 
searchable during this demonstration. ) The connection table is comprised 
of the F-1^ F-3^ and F-i^ records. 

— Record - That portion of the connection table that describes only 
the graph of the corresponding structural diagrams that is, only the 
connection network id.thout specifying the character of the bonds (i.e., 
line values) or the node identities (i.e., the element symbols cor- 
responding to the atoms in the diagram), hydrogen atoms are not in- 
in the definition of the graph. 

F: g _ Recor(i - That portion of the connection table that identifies the 

type of atom corresponding to each network node appearing in the graph 
of the F-1 record. 



that specifies the 



Record - That portion of the connection table 
bonding character of each line appearing in the graph of the F-1 record. 

F-h Becord - That portion of the connection table that describes the 
qualifiers of the two-dimensional structure diagram. This portion of 
the record contains stereochemical descriptors, non-routine valence, 
isotopic number, hydrogen-atom count, and Registry Humber. 

Ite rative Search -An atom-by-atom, bond-by-bond conparison between the sub- 
structure defined in a question and the connection talbles on file. This 
process provides only exact answers to search questions. 

Moiety - -A chemical fragment. 

Percent Screenout -Percent screenout is defined by the mathematical e:^ression 
No. of Compds Eliminate d by Screening 

~~ X 100 

Total No. of Compds. in File 

guestion Coding — The translation process required to convert a search ques- 
tion into the symbolic form required by the computer program. 

Registration — The process of determining the existence or absence of a sub- 
stance in the Registry Piles. The process includes the assignment of a 
Registry Humber (see below) to each substance that is new to the files. 

Re gistry Humber- The unique nine digit (the ninth digit is a computer- 
calculated check digit) number which is assigned to each substance when it 
first enters the Registry and which is recalled each time that substance is 
checked against the file. The Registry Humber may be used to identify 



miy th'e substance, and it is used as the address in specialised subject 
mes to identify data associated with the substance. In the Registry 
System, a Registry Humber also is the file address for bibliographic and 
nomenclature data related to the corresponding compounds. 

re gistry Systern -The interrelated set of files directly associated with 

registration and the processes for accomplishing registration. These com- 

P r files include structural records, the molecular formulas, nomencla- 
ture, and bibliographic data. 

Screen-A common structural characteristic identified in the search files 
as part of the corresponding structural diagram. The individual screens 
are selected partly on the basis of the frequency with which they appear 
as a part Of a substructure search question, and partly on ithe basis of 
the frequency with which they appear in the search file. In the search 
system, a set of screens amounts to a conveniently arranged series of yes- 
no answers to commonly asked substructure search questions. 

~ ^n °i=tionarg -A listing that defines the screens available for sub- 
Structure searching. 

Substructure -A specified set of atcais interconnected in a specified wayj 

this constellation normally represents less than a complete molecule. (It. 
Chemical Fragment) 
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SCREENING AND ITERATIVE SEARCH DATA 

This appendix contains both detailed and summary data concerning the 
screening and iterative searches conducted during the demonstration. 

Table I, the Siunmary, presents quartile figures where appropriate to enable 
the reader to better evaluate the spread of vai .ous data. • Table II pre- 
sents the raw data as a function of each individual search question. 
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SCREEI^ING AND ITERATIVE SEARCH DATA 
SUBSTRUCTURE SEARCH DEMONSTRATION 

NEW YORK, SEPTEMBER 1966 



1 



ri 



Question 


Compounds 


Percent 


Number \ 


Projected 
Number ✓ 


Number 


Passing Screens 


Screenout 


of Answers'^*' 


of Answers^' 


1 


201 


99-6 


0 


0 


2 


194 


99.7 


0 


0 


3 


225 


99-6 


0 


0 




206 


99.6 


6 (63) 


20 


5 


55^ 


, 90.0 


6 (229) 


144 


6 


6 


99.99 


0 


0 


7 


1072 


98.07 


6 (472) 


15 


8 


789 


98.58 


0 


0 


9 


68 


99-88 


0 


0 


10 


0 


100.0 


0 


0 


11 


718 


98. 71 


0 


0 


12 
1 “7 


10 


99.99 


5 


5 


13 


3 


99-99 


0 


0 


14 


2 


99-99 


0 


0 


13 

TK 


2483 

- 


95-52 


6 (1267) 


11 



17 

18 

19 

20 
~2T 
22 
25 
24 
23 



4216 
1055 
^5 
13 



27 

28 

29 

30 



450 

965 

77 

1 

2590 



7T 

32 

33 

34 

35 



108 

556 

40 

172 

395 



92.39 

98.07 

99-92 

99-99 

■99- 19 

98.27 

99-88 

99-99 

95-33 



0 

0 

6 

0 



(14) 



1016 

43 

525 

975^ 

156 



99 - 81 
99-0 
99-93 
99-79 

99-29 



“6 (.39; 
0 

6 ( 9 ) 

0 

6 (17) 



0 
0 
0 

19 

0 



98.17 

99-92 

99-06 

82.40 

99-72 



6 (26) 

6 (226) 
0 
2 

6 ( 6 ) 

“6 l35s; 
0 

6 (13) 

0 

0 



69 

0 

51 

0 

911 



14 

0 

2 

595 



16 

0 

262 

0 

0 



(D 



(2) 



For questions which had fewer than 6 hits, all compounds passing the 
screens had to be iteratively searched. For questions which reached 
the maximum cutoff of 6 hits, the number in parentheses indicates the 
number of compounds which had to be iteratively searched up to that 
point . 



For questions which had the maximum cutoff of 6 hits, the projected 
number of hits was calculated as follows: 

Proiected number- gwmfeer of compounds passing screens g 
rrojecTjea numoer- Number of iterative searches ^ ° 
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SCREENING AND ITERATIVE SEARCH DATA (oont'd) 



Question 

Number 

36 

37 

38 

39 

40 

41 



42 

43 

44 
43 



46 

47 
43 
49 
_30 



51 

52 

53 

54 



56 

57 

58 

59 

60 



63 

64 

65 



66 

67 

68 

69 

70 



TT 

72 

73 

74 

21 . 

76 

77 

78 

79 

80 



Compounds 
Passing Screens 



8 

81 
18 
2683 
136 
367 
409 
18 
319 
337 



32- 
3239 
55 
2493 
321 



2063 

36 

34 

1727 

36 



315 

493 
723 
232 

23 

24,243 

1217 

201 

1824 



946 

47 

56 

2 

2043 



9 

1300 

0 

1364 
1637 
79 
849 

3139 

367 

46 



Percent 

Screenou t 

99.99 

99.88 

99.97 

95.16 

99.72 

99.34 

99.2 

99.97 
99.43 
99.4 



99:^ 

94.12 

99.83 

95.3 

99.4 



96.27 

99.9 

99.9 

97.89 

99.9 



99.44 

99.12 

98.7 

99.6 

99-96 

56.24 

97-81 

99.6 

96.71 



98- 3 

99- 9 
99-85 

99.99 

96-3 



99.99 

97.30 

100.0 

97.52 

97.0' 

99.86 

98.47 

94.20 

99.4 

99.9 



Number ( 1 ) 
of Answers 



1 

2 

0 

1 

2_ 

0 

6 

1 

6 

6 



(93) 

( 22 ) 

(244; 



0 

0 

0 

0 

6 (15) 



Tnjir 

1 

4 

6 (47) 

6 (24) 




0 

6 

1 

6 

2 

5 

6 
6 
6 
6 



(33) 

(13) 



(175) 

(71) 

( 10 ) 



0 

0 

0 

1 
5 



0 

0 

0 

0 



0 

6 (240) 
1 
0 



Projected 
Number (2) 
of Answers 

1 
2 
0 
1 
2 
0 

26 
1 
87 
9 



0 

0 

0 

0 

128 



TI3o" 

1 

4 

218 
9_ 



0 

89 
1 

126 

5 
824 

101 

80 

1094 



0 

0 

0 

1 

5 




0 

0 

0 

0 

0 



0 

78 

1 

0 
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SCREENING AND ITERATIVE SEARCH DATA (cont*d) 



I 


Question 


^ Compounds 


Percent 


Number 


Projected 

Number 




Number 


Passing Screens 


Screenout 


of Answers'^'' 


of Answers^' 


T 


81 


26 


99.96 


0 


0 


1 / 


82 


17 


99.97 


0 


0 




83 


13 


99.97 


0 


0 




8^ 


21 


99.97 


6 (6) 


21 


ULJ 


_ 83 


X857 


96.6 


2 


2 




86 


21 


99.96 


6 (. 7 ; 


20 




87 


0 


100.0 


0 


0 


qd 


88 


0 


100.0 


0 


0 




89 


^75 


99.15 


6 (6) 


475 




90 




99.9 


0 


0 




91 


• 143 


99.75 


5 


5 


UP 


92 


* 










93 


1 


99.99 


0 


0 


A i 


9^ 


1059 


98.07 


6 (87) 


72 


UP 


33 


0 


100.0 


0 


0 




96 


?39 


99.6 


0 


.0 


cAjy 


97 


^596 


91.71 


6 (26) 


1057 


U C? 


98 


36 


99.9 


0 


0 




99 


^117 


92.3 


6 (1455) 


17 




101 


10 


99.99 


TT6) 


lo 


1 ■ " 


102 


0 


100.0 


0 


0 


UP 


103 


2^ 


99.96 


6 (6) 


24 


r*ir 


10^ • 


11 


99.99 


6 (7) 


10 


'! 1 


105 


230 


99.6 


5 


3 




lo§ 


4 


99.99 


4 


4 




107 


868 


98.45 


5 


5 




108 


5963 


89.24 


6 (364) 


95 




109 


0 


100.0 


0 


0 




110 


360 


99.4 


6 ( 12 ; 


180 




111 


4579 


91.72 


6 (.814) ' 


32 




112 


329 


99.^1- 


0 


0 


3li? 


113 


0 


100.0 


0 


0 




11^ 


20 


99.98 


0 


0 


r 


115 


253 


99.6 


6 (6) 


253 


tti‘1 


Tie 


1294 


97 . 37 


5 


5 




117 


32 


99.95 


0 


0 


w 


118 


425 


99.24 


6 ( 319 ) 


8 




119 


0 


100.0 


0 


0 




120 


^ ,-9 


99. 99 


0 


0 




121 


l6,716 


69.83 


5TTJ57) 


67 


I 


122 


1426 


97.^3 


6 (128) 


66 




123 


51 


99.8 


0 


0 




12^ 


1278 


97 .^ 


6 (285) 


27 


1 

1 


125 


324 


99 .^ 


0 


0 




Bibliography search only was 


performed 


on four specific 


compounds 


1 


V7hich had 


been identified by 
« 


structure 


and Registry Number. 
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SCREENING AND ITERATIVE SEARCH DATA (cont'd) 



Question 


Compounds 


Percent 


Number qn 


Projected 
Number / 


Number 


Passing Screens 


Screenout 


of Answers^ ^ 


of Answers^' 


126 


7 


99-99 


0 


0 


127 


187 


99.7 


0 


0 


128 


259 


99.6 


0 


0 


129 


1463 


97.35 


0 


0 


130 


2822 


94,88 


6 (1550) ■ 


9 


.131 


550 


99.4 


3 


5 


132 


2685 


95-16 


6 (1043) 


13 


133 


317 


99-07 


0 


0 


134 


0 


100.0 


0 


0 


135 


2863 


99-88 


6 (160) 


106 


136 


3256 


94. 12 


6 UOO'6; 


16 


137 


326 


99.08 


6 (115) 


27 


138 


396 


98-9 


6 (6) 


396 


139 


426 


99.24 


6 (8) 


320 


140 


134 


99-76 


1 


1 


141 


0 


100.0 


0 


0 


142 


17 


99-98 


3 


3 


143 


21 


99-96 


6 (6) 


21 


144 


81 


99. 86 


0 


0 


145 


41 


99.94 


0 


0 


146 


49 » 630 


I0-4I 


6 ^6297; 


43 


1^7 


0 


100.0 


0 


0 


143 


0 


100,0 


0 


0 


149 


149 


99-74 


6 (6) 


149 


150 


34 


99.95 


0 


0 


131 


T153 


97-96 


1 


1 


132 


795 


98-57 


6 (14) 


340 


133 


27.896 


49-65 


6 (226) 


725 


134 


203 


99-64 


0 


0 


155 


5954 


89.24 


6 (17) 


1985 


136 


2290 


93. 87 


i (593) 


23 


157 


493 


99.12 


0 


0 


138 


336 


99.4 


6 (90) 


22 


159 


6451 


88.36 


6 (9) 


4296 


160 


3553 


95.59 


6 (12) 


1777 


161 


I4",9I9 


73.07 


6 (143; 


612 


162 


14,360 


74- 08 


6 (1895) 


43 


163 


1852 


96-6 


0 


0 


164 


260 


99.6 


0 


0 


165 


274 


99.61 


0 


0 


166 


I346 


97.33 


6 (.465) 


16 


167 


1343 


97.22 


6 (69) 


153 


168 


0 


100.0 


0 


0 


169 


133 


99.72 


6 (16) 


58 


170 


6 


99.99 


0 


0 
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For the Substructure Search demonstration. CAS utilized eight different 
screens that together contained approximately 1350 screen items, as listed 
in lable C-I below. A screen item may include not only the identification 
of a structural feature, but also a numerical indication of the number of 
times It appears in a structure. For example, one screen item might re- 
quire one occurrence of the fragment C-C in a compound. While another 

screen might require two occurrences of the same fragment. Each screen 



TABLE C-I 



Screen 
Atom Counts 



Number of Screen Ttemg 

kk 



Ring Counts 
Element Counts 



Bond Cou n ts 



Atom-Bond- Atom "Triplets" 

First-Level Connectivities ("Moieties") 
Salt, Ammoniate, and Hydrate Fragments 



Ring Sizes and Specific Structural 
Characteristics 



20 

no 

114 

ifl9 

502 

kk 

31 



item is chosen according to chemists' intuition and the results of earlier 
experience; each selected item is assigned one of the 2000 bit-indicator 
positions set aside for that purpose. 

AU the screen items taken together constitute a screen dictionary from 
which appropriate screens are identified for each Substructure Search ques- 
tion. The frequency with which each screen item occurred in the demonstra- 
tion file of 55.396 compounds was also available to assist in question coding. 
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screen types 



Explana'iion of Screen yypes 

The following explanations and examples illustrate the 
used in the demonstration. The "Appropriate Screen Items" specified in each 
example apply only to the screen type under discussion. In actual searches, 
appropriate screen items are chosen from several screen types. 



atom count screen eliminates all compounds having fewer than a 
specified number of nonhydrogen atoms. 

EXAMPLE: 



Question 



X 



\ 

N — 

/ 



X 




\ 

X 



X 

/ 

N 

\ 

X 



X = any nonhydrogen atom 



Appropriate Screen Item 

Require a count of 9 or more 
atoms for each potential 
answer . 



COUNT SCREEN eliminates all compounds having fewer than a specified 
number of rings. Rings are defined according to the Ring Index rules 
(i.e., the minimum number of scissions of ring bonds required to 
produce a completely acyclic structure). 



EXAMPLE: 



Question 



Appropriate Screen Item 




Require a count of 2 or more 
rings for each potential 
€uiswer. 



C-2 



X = any nonhydrogen atom 



structures having fewer than a 



• EI^ffiNT COUNT SCRBRW eliminates those 

specified number of atoms of a given element or elements. 

There is at least one screen item in the Element Composition cate- 
gory for each element of the periodic table. The more commonly occur- 
ring elements (e.g. C.N.O.S, and halogens) have additional screen items 
to allow for higher frequency counts. For example, in addition to the 
screen item for a single Cl in a structure, there are screen items for 
two, three, five, seven, and nine or more Cl's in a structure. 

EXAMPLE: 



Question 



Appropriate Screen Items 



Cl 




•Cl 



Require 
2 or more Cl 
9 or more C 
1 or more R 



X = any nonhydrogen atom 



BOHD type AMD COUNT SCREEN, eliminates all structures having fewer than 

a specified number of a given type of bond or bonds. Bond types are 
defined as follows : 



Bond 

Symbol 

1 

2 

k 

B 

G 

J 

K 



Bond 

Significance 



Single bond, acyclic 
Double bond, acyclic 
Triple Bond, acyclic 
Don't Care" Uny bond 
ring bond 

Single bond, cyclic 
Double bond, cyclic 



is acceptable) 



Bond 

Symbol 



Bond 

Significstnee 



I " Any bond that is part of a 

fu31y conjugated system of 
single and double ring bonds 
M Triple bond, cyclic 

” Any chain bond 



EXAMPLE: 



Question 

0=N=0 




Appropriate Screen Items 
Require 

6 or more L bonds 

1 or more 1 bonds 

2 or more 2 bonds 



X = C, N, or 0 



5* ^pM~BQND-ATOM ^TRIP LETS” SCREEN eliminates all structures having fewer 
than a specified number of "triplets”. A triplet is defined by identi- 
fying two connected atoms and their connecting bond. Atom-bond-atom 
triplets are included only for the 12 most populous elements on file: 
B»Er,C,Cl,F,I,N,0,P,S,Si,Sn. Specific bond types are identified as listed 
in Screen k for most pairs of elements, but for less common elements 
(i.e., B,I,Si,Sn), many screen items describe only generic-level bonds. 



EXAMPLE: 

Question 




Appropriate Screen Item 
Require 

J 

11 or more C-C triplets 



X = any atom 




■j: 

ti) 



H RST-LEm COmKCTIvm screen (■•MnTV.T.ca eli^ninates all structures 
having fever than a specified number of "moieties." Moieties are defined 
as the number and type of atoms attached to a central atom together 
With their connecting bonds. Moiety descriptions are included only 
for the six most common polyvalent elements, namely: C.B.O.P.s.Si. All 
bond descriptions in this group are specific. 



■.-ir'-f 

c® 



EXAMPWS: 




as 




■I 

mm 






bik 



IfH 






/nqn 



Question 


Appropriate Screen Item 


X 0 

II 


Require at least one 


X i " 

X C c CH, 


C— C— C moiety 
0 


1 

X 




X = any nonhydrogen atom 


• 


SALT,_^ONIATE. MD HTORATE SCREEW 


all structural records 



that do not contain a specified atom or atoms in the "salt portion" of the 
structural record. Screen items in this category are included only for 
those elements known to be present in the salt, ammoniate, or hydrate 
portion of the file. This screen does not include frequency counts. 



EXAMPLE: 



Question 




Appropriate Screen Itpm 

Beg,uires Na to be present 
in the salt portion of the 
record of each potential 
answer • 




X = any nonhydrogen atom 
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These screens 



Rlh'G SIZES AHD OTHER SPECIFIC STRUCTURAL CHABACTERISTTf^S . 
eliminate all structures that do not contain certain special chemical 
fragments not included in the above screen categories. In this category 
there are screen items for specific ring sizes of 3 to 19 atoms inclusive 
plus a screen item for rings containing 20 or more atoms. These ring sizes 
are applicable to all possible cyclic paths in a structure (e.g. , anthra- 
cene has rings of 6, 10, and lU atoms). Other screen items included in 
this category are various groups such as 6-membered carbocycle, steroid 

nucleus, etc., and some generic screens such as any metal, any halogen, 
any hydrocarbon. 



^ropriate Screen Item 

Requires a six-memhered 
heterocycle plus other 
applicable screens for 
each potential answer. 



EXATiPLE: 

Question 




X = any nonhydrogen atom 
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DEMONSTRATION FILE CHARACTERISTICS 












I 




1 

I 



TABLE D-I 

NUMBER OF OCCURRENCES OP VARIOUS ELEMENTS* 



T 


Elem. 


No. of 
Occurrences 


Elc-aio 


No. of 
Occurrences 


Elem. 


No. of 
0cc\irrences 


1 


Ag 


7 


Fe 


10 


Po 


6 


T 


A1 


69 


Ga 


26 


Pt 


6 


f 


As 


326 


Ge 


221 


Ru 


1 


T 


Au 


1 


Hg 


319 


S 


20,803 




B 


819 


I 


732 


Sb 


99 


s 


Ba 


1 


K 


8 


Se 


247 




Be 


1 


La 


2 


Si 


1,390 


OV} 


Bi 


17 


Li 


34 


Sn 


534 




Br 


2,639 




35 


Sr 


1 


3 a 


C 


- 


Mn 


2 


T 


38 


u & 


Ca 


9 


Mo 


3 


Ta 


2 


a Sf. 


Cd 


1 


N 


>55,000 


Te 


42 


a a 


Cl 


8,701 


Na 


15 


Th 


1 


-:s » 

O 


Co 


3 


Nb 


1 


Ti 


18 


o o 


Cr 


20 


Ni 


3 


T1 


17 


O C* 


Cu 


18 


0 


>55,000 


U 


1 


a « 


D 


256 


P 


5,548 


V 


44 


O £> 


£u 


1 


Pb 


98 


Zn 


55 


n r 


P 


7,438 


Pd 


2 


Zr 


9 






*Redundancy exisvs in this Table since the figiires are based on the mimber 
Of occurrences oi* compounds containing a specific number and type of 
atom-bond-atom combinations, or "triplets". For example, methane sulfonic 

acid contains one C S bond, two S=0 bonds and one S 0 bond; this 

accounts for three occurrences of sulfur in the Table. 
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TABLE D-II 



NUMBER OP COMPOUNDS CONTAINING VARIOUS ELEMENTS 
AS SALT, AMMONIATE, OR HYDRATE FRAGI4ENTS 



Metal Salts 



Elem. 


No» of 
Compds. 


Elem. 


No. of 
Compds 


Ag 


21 


K 


113 


A1 


9 


Li 


18 


Au 


14 


’Nig 


12 


Ba 


23 


Mh 


10 


Be 


3 


Na 


445 


Bi 


1 


Nd 


1 


Ca- 


45 


Ni 


3 


Cd 


2 


Pb 


8 


Ce 


1 


Pd 


1 


Co 


10 


Pr 


1 


Cs 


8 


Pt 


1 


Cu 


25 


Rb 


2 




1 


Sb 


1 


£u 


1 


Sn 


3 


Fe 


9 


Sr 


2 


Ga 


1 


V 


1 


Hg 


9 


Zn 


26 


Ho 


1 


Zr 


2 



Non-Metal Salts* Other 



Elem. 


No. of 
Compds . 


Elem. 


No. of 
Compds. 


Br 


595 


B(BH3) 


13 


Cl 


3108 


NCnHs) 


63 


P 


16 


O(HaO) 


15 


I 


507 







^Includes only single -atom anions 
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TABLE D-III 






«W3P| 



UD 


ELEMENTS NOT APPEARING IN ANY COMPOUNDS 


OF THE DEMONSTRATION FILE 




Ac 


Fr 


Ne 


Rn 




Am 


Gd 


No 


Sc 


x» a 


At 


He 

1 


Np 


Sm 


•« - *, 

• 

XI tf 


At 


Hf 


Os 


Tb 


w ■ » 


Bk 


In 


Pa 


Tc 


% o 


Cf 


Ir 


Bu 


Tm 


« w 


Cm 


Kr 


Pu 


W 


a # 


Er 


Ln 


Ra 


Xe 




Es 


Lw 


Re 


Y 




.Rn 


Md 


Rh 


Yb 



f ” 

r 

c a 

r • 



A D 



UD 
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Percentage of Coii5)ounds 



100-1 

90 - 

80 - 

70 - 

60 - 

50 - 

40 - 

30 - 

20 - 

10 - 

0 - 




I I i n i T"^ I 1 [ 

!0 20 30 40 50 60 70 80 90 

Wximber of Nonhydrogen Atoms Per Compound 



Figure 1 - Distribution of Compounds Containing Different Numbers of 
Nonhydrogen Atoms 
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Figure 2. - Distribution of Compounds Containing Different Numbers of Rings 
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TABLE D-IV 



NUMBER OP COMPOUNDS CONTAINIiiG VARIOUS TYPES OF COVALENTLY 
BONDED ATOM-PAIRS ("TRIPLETS”) IN THE DEMONSTRATION PILE^ 



UD 






’<3 t! 



at} 



Ct7 



Atom-pair 


[Bond Type 


No. of 
Compounds 


Atom-pair 


Bond Type 


No. of 
Compounds 


is 

•H 

a 

Pi 

i 

S 

0 

4J 


Br-C 


2** 


2%30 


C-N 


4 


1796 


Cl-Si 


Br-N 


1 


9 


C-0 


J 


6859 


P-I 


Br-0 


1 


1 


C-0 


K 


2 


P-0 


Br-0 


2 


1 


c -0 


L 


76 


p-p 


Br-P 


1 


4 


c -0 


1 


29723 


P-S 


Br-S 


1 


1 


c -0 


2 


28241 


P-Si 


Br-Si 


1 


6 


c -0 


4 


1 


I-I 


C-C 


J 


26396 


C-P 


J 


54 


I-O 


c-c 


K 


10820 


C-P 


K 


1 


I-O 


C-C 


L 


3559>i 


C-P 


1 


1121 


I-O 


c-c 


M 


10 


C-P 


2 


79 


I-P 


c-c 


1 


47822 


c-s 


J 


3246 


I-S 


c-c 


2 


7064 


c-s 


K 


1 


I-Si 


c-c 


4 


885 


c-s 


L 


5 


N-N 


C-I 


j 


22 


e-s 


1 


6464 


N-N 


C-I 


1 


595 


c-s 


2 


1247 


N-N 


C-N 


J 


13490 


Cl- 1 


1 


3 


N-N 


C-N 


K 


4530 


Cl-N 


1 


53 


N-N 


C-N 


L 


4858 


Cl-0 


1 


246 


N-N 


C-N 


M 


1 


Cl-0 


2 


249 


N-0 


3-N 


1 


27396 


Cl-P 


1 * 


127 


N-0 


3-N 


2 


3945 

• „ _ • • • • 


Cl-S 


1 


115 


N-0 



I, N, 0, P, S, Si) is bonded to another are included in this table. 
^«*See Appendix C, Screen 4 for identification of bond types. 



0) 

S 

EH 

tj 
d 

0 
m 

1 

1 

1 

1 

1 

1 

1 

J 
• 

u 

2 

1 
1 
1 
J 
K 
L 
1 
2 
4 
J 
1 . 

2 

F, 



l6o 

1 

1 

196 

148 

31 

8 

20 

23 

2 

1 

1 

1 

1660 

258 

170 

2623 

1092 

201 

352 

972 

5332 
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LIST OP QUESTIONERS AND THEIR AFFILIATIONS 

remote substructure search demonstr/jion 

NEW YORKj SEPTEMBER I966 



Name 

I 

A aland, Mrs. Sharon 
Aszalos, Dr. A. 

Bab ad. Dr. Harry 
Barton, T. J. 

Bauman, Robert 

Benson, Dr. F. R. 

Berezin, Dr. G. H. 

Berger, Dr. 

Bernier, Dr. Charles L. 
Bonanno, S. R. 

Bose, Dr. A. K. 



Affiliation 
Abbott Lab. 

North Chicago, 111. 6oo64 

Squibb Inst. 

New-Bruhswick, N. J. 

Univ. of Denver 
Dept, of Chem. 

Univ. of Florida 
Dept, of Chemistry 
Gainesville, Fla. 3260I 

Colgate-Palmolive Co. 

909 River Road 

Pi sc at away, N. J. 08854 

Atlas Chem. Ind. 

Wilmington 99, Delaware 
Manager, Information Section 

DuPont Company 
Explosives Dept. 

Experimental Station Lab. 
Wilmington , Delaware 

Baxter Labs. 

Morton Grove, 111. 

The Squibb Inst, for Medical Res. 
Georges Road 

New Brunswick, N. J. 08903 

The Squibb Inst, for Medical Res. 
Georges Road 

New Brunswick, N. J. 08903 

Dept, of Chemistry 

Stevens Institute of Technology 

Hoboken, N. J. 07030 



Name 

Boyack, Dr. G. A. 

. . I 

Braswell > Br. E. H. 
Bristol, D. W. 

Brown, Horace D. 
Burt, Dr. G. D. 

Byck, Joseph S. 

Cardeilhac, Dr. p. T. 

Casey, J. p. 

Chakrin, A. L. 
Chisolm, R. A. 

Cinnamon, J. M. 



Affiliatio n 

The Upjohn Company 
Kalamazoo, Michigan 

Univ. of Conn. 

Storrs, Conn. 

Chem. Dept. 

Syracuse Univ. 

Syracuse, N. Y. 13210 

Merck and Co., Inc 
Rahway, N. J. 

Harshaw Chemical Co. 

Cleveland, Ohio 44l06 

Box 4oS Havemeyer 
Columbia University 
New York, N. Y. 10027 

Dept, of Physiology and Pharm. 
Oklahoma State Univ. 
Stillwater, Oklahoma 

Univ. of Virginia 
Dept, of Chemistry 
Charlottesville, Va. 22903 

Univ. of Chicago 

3M W. Bldg. 201-25 
St. Paul, Minn. 55101 

Shulton 



Clarke, Dr. Donald D. 
Crav/ford, Thomas H. 

Culvenor, C. C. J. 

De Stephen, Tony 

Donovan, Miss Kathryn M, 
Drew, Dr. Howard P. 



Fordham Univ. 

Dept, of Chemistry 
Univ. of Louisville 
Louisville, Ky. 4020S 

CSIRO;, Australia 

Harshaw Chem. Co. 
Cleveland, Ohio 44lo6 

Pennsalt Chemicals Corp. 
900 First Ave. 

King of Prussia, Pa. 194o6 

Proctor and Gamble 
Research Division 
Miami Valley Labs. 
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Name 

DuDock^ Dr. B. S. 
Dutton, Herbert 
Ebert, Miss Helen M. 



Affiliation 

Dept, of Biochemistry 
Cornell Univ. 

Ithaca, N. Y. 

Northern Regional Research Lab. 
IS15 N. University 
Peoria, 111. 

Smith, Kline and French 



Eddy, Dr. L. P. 

Elston, Dr. C. T. 

Fallon, Dr. Frances 
Fetterolf, Dr. L. M. 

Finkbeiner, Dr. 

Foote, Dr. H. E. 
Fraction, George 

Franck, Dr. Richard W. 
Frank, Dr. S. 

Friedman, Dr. Herbert A. 
Gans, Richard 
Garwig, Paul L. 



Western Washington 
State College 

DuPont of Canada 
Research Center 
Kingston, Ontario 

The. Wm. S. Merrell Co. 
Cincinnati, Ohio 45215 

Smith, Kline and French 
1500 Spring Garden Street 
Philadelphia, Pa. 

General Electric Res. 

Box 8 

Schenectady, N. Y. 

Avi Publ. Co. 

Eli Lilly and Co. 
Indianapolis, Ind. 46205 

Chemistry Dept. 

Fordham Univ. 

Bronx, N. Y. 10458 

American Cyanamid Co. 
Central Research Div. 
Stamford, Conn. 

Sloan-Kettering Institute 
145 Boston Post Road 

Rye, N. Y. 10580 

Prick Clem. Lab. 

Princeton Univ. 

Princeton, N. J. 854o 

P.M.C. Corp 
Box 8 

Princeton, N. J. 



Name 



Affiliation 



Gassmann, Dr, Paul 

Gelberg, Alan 
Gerson, H. 

Giddings, N. P, 
Giner-Sorolla, Dr, A. 

Goldstein, Edward J. 

Gosink:, T. A. 

Gough, Dr. S. T. D. 

# 

Gould, Dr. David 
Grindahl, G. A. 

Gruen, H. 

Gudmunsen, Dr. C. H. 
Guiduci, Dr. M. A. 
Gunther, Dr. W. H. H. 

Haarstad, Dr, V. B. 
Haggard, Dr. R. A. 



Chem. Dept. 

Ohio State University 



Diamond Alkali Company 

Allied Chemical Corp, 
Box 14 

Hav7thorne, N. J. 

Pacific Lutheran Univ, 
Tacoma, Washington 

Sloan- Kettering Inst, 
4l0 E. 6Sth Street 
New York 

Colgate-Palmolive 
909 River Road 
Pi sc at away, N. J. 

Old Dominion College 
Norfolk, Va. 2350S 



Mobil Chem. Co. 

Metuchen, N. J. 

Colgate-Palmolive Center 
Piscataway, N. J. 

Bovj Corning Corp, 
Midland, Michigan 

Binghampton, N. Y. 



Wyeth Labs. Div, 

Radnor, Pa, I910I 

E, R. Squibb 

New Brunswick, N, J. 

Yale Univ. 

333 Cedar Street 
New Haven, Conn 

Tulane Univ, 

New Orleans, La, 

Rohm and Haas Co, 
Springhouse, Penn, 19477 



Esso Research 
P. 0, Box 51 
Linden, N, J. 



Hall, Dr, H. J, 









Name 

Hamaker, Dr. J. w. 
Hayward, H. N. 

Heckman, Robert A. 

Heidt, Dr. L. J. 
Hollinden, S. 

Holly, Lloyd A. 

Hopps, Dr. Harvey 

lorio, E. James 

Jacobs, Dr. R. L. 

Kaback, Dr. S. M. 
Ranter, M. J. 
Kassel, R. j. 

Kazama, Yoshiteru 
Kellett, Dr. J. c. 



Affiliation 
Dow Chem. 

Walnut Creek, California 

U. S. Patent Office 
R and D 
l4o6 G. Street 
Washington, D. C. 

R. J. Reynolds Tobacco Co. 
Research Dept. 
Winston-Salem, N. C. 

M.I.T. 

Cambridge, Mass. 

Eli Lilly and Co. 

McCarty and Alabama Streets 
Indianapolis, Indiana 

Industry Liaison Office 
Research Labs. 

Edgev7ood Arsenal, Md. 

Aldrich Chemical Co. 

2369 N. 29th Street 
Milwaukee, Wisconsin 53210 

Chemistry Dept. 

Northeastern Univ. 

Boston, Mass. 

Maume Chem. Co. 

1310 Expressway Drive 
Toledo, Ohio 

Esso Research and Eng. 
Linden, N. J. 

Dept, of Chemistry 
Univ. of 111. 

Edgewood Arsenal 
Chem. Research Labs. 

Md. 

Stevens Inst, of Tech. 

P. 0. Box 1236 
Castle Point Jtn. 

Hoboken, N. J. 07030 

N S P 

ISOO K. Street, N. W. 
Washington, D. C. 
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Name 


Affiliation 


I 

HP 

A 


Kerber,^Dr. Robert C. 


Dept, of Chemistry 
State Univ. of New York 
Stony Brook, N. Y. II790 


Kormejn, J. 


Upjohn Co. 
Kalamazoo, Michigan 


cn 


Kriman, Dr. M. M. 


Allied Chem. Corp. 
Morristown, N. J. 


DD 


Kuntz, I. 


Enjay Polymer Labs. 

p. 0. 45 

Linden, N. J. 07036 


APIS 


Kurtz, Arthur Peter 


Box 4o3 



a D 



Liebman, J. p. 



Lipowitz, Dr. J. 



Havemeyer Hall 
Dept, of Chemistry 
Columbia University 
New York, New York 10027 



aj:» 

aa 


Kwiatek, Dr. J. 


U. S. Industrial Chemicals 
1275 Section Road 
Cincinnati, Ohio 452^7 




LaMontagne, M. P. 


Duquesne University 
Dept, of Chemistry 






Pittsburgh, Pa. I5219 


aa 

i 

j . 


Landers, J. 0. 


Dept, of Chem. 

Ohio State Univ. 
Columbus, Ohio 43210 




Danger, Dr. S. H. 


Chem. Engr. Dept. 
Univ. of Wisconsin 






Madison, Wisconsin 


ora 

319 


Levine, Dr. R. 


Univ. Pittsburgh 
Chemistry Department 
Pittsburgh, Pa. 15213 


crp 

db 


Libby, Louis H. 


Research Triangle Park 
North Carolina Science and 



North Carolina 

Brooklyn College 
(Mail to: 2962 Brighton 
3th Street 
Brooklyn, N. Y. 

Dow Corning Corp. 

Midland, Michigan 
(Phys. Chem. Res. Dept.) 
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Name 

Liu, Mr.^ Joseph Ko-Chiung 
Long, Gary J. 



Affiliation 

Dept, of Chemistry 
McGill University 
Montreal 2, P.Q., Canada 

Dept, of Chemistry 
Syracuse Univ. 

Syracuse, N. Y. 1^210 



Longenecker, N. H. 



Port Detrick 
Fred., Md. 21701 



Lyle, Dr. R. E. 

4C3iO 

::2t3 

Malkiewich, E. J. 

<C5 a 

Maizell, Dr. R. E. 
Marsh, Dr. John L. 

D 

Marshall, Dr. W. J. 

n:?a 



Dept, of Chem. 

Univ. of Nev; Hampshire 
Durham, N. H. 

Hoffmann- LaRoche 
Nutley, N. J. 07110 

Olin Mathieson Chemical Corp. 

Ciba Pharmaceutical Co. 

Morris Ave. 

Summit, N. J. 

DuPont 

Pigments Dept. 

256 Vanderpool Street 
Newark, N. J. 









C3p 



ati 



da 





I 

I 

1 



Matthews, Fred W. 
McCarthy, Miss J. 

McKelvie, Prof. Neil 

Milewich, Dr. L. 

Mitchell, Leonard D. 
Montague, Miss B. A. ' 
Narvaeg, Dr. R. 



Canadian Industried Ltd. 
McMasterville, Quebec 

Monsanto Co. 

1700 South 2nd Street 
St. Louis, Mo. 63177 

Dept, of Chemistry 
City College (city U. of N. Y. ) 
Convent Ave. and l4o Street 
N. Y. 10031 

Johns Hopkins University 
School of Medicine 
Baltimore, Md. 

Herner and Co. 

Washington, D. C. 

DuPont 

Wilmington, Delaware 

DuPont Company 
Experimental Station 
Wilmington, Delaware 
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Name 

Notation, Dr. A. D. 

Nutting, N. H. 
Odstrchel, Dr. G. 
Orchin, Dr. M. 
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Parker, Williaxii L. 
Pathak, Balai Chand 



Phillips, Dr. A. P. 
Pinkus, Dr. J. L. 
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Regan, Dr. 

Rice, Dr. Charles 
Roberts, Dr. D. L. 

Ross, Joseph 

1 

Santoro, Angelo 




Schafter, Dr. C. D. 




Scheffler, Dietmar 
Schlessinger, Dr. G.. G. 




Schramm, William 










Affiliation 

" ■ ■ ' 

Univ. of Minnesota 
Biochemistry Dept. 

University of California ? 

Duquesne Univ, 

Univ. of Cincinnati 
Dept, of Chem. 

Cincinnati, Ohio 

Dow Chemical Co. 

P. 0. Box 400 
Qayland, Mass. 

School of Pharmacy { 

Dept, of Medicinal Chemistry 1 

University of Buffalo ^ 

N. Y. 14214 

• ■ ' r 

Burroughs Wellcome Co. 5 

Univ. of Pittsburgh 3 

Dept, of Chemistry 

Pittsburgh, Pa. 15213 | 

Baxter Labs. j 

Morton Grove, 111. 

Eli Lilly and Co. 1 

Indian^olis, Ind. 46205 

R.J. Reynolds Tobacco Co. i 

Winston-Salem .j 

North Carolina 27101 -1 

Indiana University 

South Bend, Indiana 46615 1 

Hunter College j 

Park Ave. and 69th Street ! 

N. Y. ; 

Inst. Pur Documentation 
6 Frankfurt /i'^ain 1 
Vogtstr. 50 

Univ. of Delaware 
Newark, Delaware 

Newark College of Engineering ^ 

Chemistry Dept. 

323 High Street 
Newark, N. J. 07102 

Pood and Drug Administration ? 

Washington, D. C. 
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ERIC 
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Name 
Scott, p. M. 

Shwayder, ¥. M. 

Simmons, Dr. No'el 

Skoza, Lorant 
Slater, J, 

Slavin, Donald 

'Smith, James H. 
Srinivasan, Dr. V. R. 

Stanfield, Dr. M. K. 

Starkey, R. J. 

Stern, Dr. R. L. 

Stolow, Dr. R. D. 

Stucky, Galen 
Sxvartz, J. 



Affiliation 

Food and Drug Directorate 
Ottawa, Canada 

Shwayder Chemical Metallurgy Corp. 
634 E. Woodbridge 
Detroit 26, Michigan 

State Univ. College 
Elmv7ood Avenue 
Buffalo, New York 14222 

1856 85th Street 
New York, N. Y. 10028 

Manager R and D 
Southern Nitrogen Co. 

Savannah, Georgia 

Sadtler Research Labs. 

3516 Spring Garden Street 
Philadelphia, Pa. 19104 

Univ. of California 



L.S.U. 

Baton Rouge, Louisiana 

Dept, of Biochem. 
Tulane Med. School 
1430 Tulane 
New Orleans, La 70112 

Perry Rubber Co. 

1875 Harsh Ave. S. E. 
Massilon, Ohio 

Northeastern Univ. 

Dept, of Chem. 

Boston, Mass. 

Tufts Univ. 

Chem. Dept. 

Medford, Mass. 02155 

Chem. Dept . 

Univ. of Illinois 

Olin Research Center 
Tech. Information Serv. 
New Haven, Conn. 



Theilheimer, Dr. William 
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Thirtle, Dr. J. R. 
Thompson, Dr, M, W, 
Tillmanns, Dr, Emma June 

Triner, ¥. J. 



Eastman Kodak Corp. 

Rutgers University 

Atlas Chemical Industries 
Wilmington, Delaware 19899 

General Aniline and Film Corp, 
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Usher, Dr, D, A, 

.1 ' ' 

,V ti u 



Cornell University 
Ithaca, New York 14850 
(Baker Laboratory) 



Van Cot , Dr , J , G, 

o 



DuPont Co, 

Wilmington, Delaware 



Viola, A, 

o a 



Northeastern Univ, 
Boston, Mass, 
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Voo, D, 

Waring, Sister Mery Grace 
Weakley, M, L, 

Wei, P. H. L. 



C, F. Braun and Co. 
Murray Hill, N. J. 

Marymount College 
Salina, Kansas 

Nipak 

Pryor, Oklahoma 

Wyeth Labs., Inc. 
Radnor, Pa, 



Wilcox, Dr. C. F. 
Williamson, K, 

^ Yaktin, H. K. 

Youker, John 
•T 

clil 



Chemistry Dept. 

Cornell Univ, 

Ithaca, N. Y. 

Dept, of Chem, 

Mount Holyoke College 
South Hadley, Mass. 01075 

Hess and Clark Research Farm 
Ashland, Ohio 44805 

Rensselaer Polytechnic Inst. 
Troy, N. Y. 

Dept, of Chemistry 



Young, Prof. J. A. 
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Kings College 
Wilkes-Barre, Pa. 
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Name 

Young, Dr. Lewis 



Affiliation 
Dow Chem. 



Zabik, Matthew J, 



Zwick, Dr. M. M. 



Dept » of Entomology 
Michigan State Univ. 

East Lansing, Michigan 43823 

American Cyananiid 
1937 W. Main Street 
Stamford, Conn. 
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EXAMPLES OF QUESTIONS AND RETRIEVED ANSWERS 



This appendix contains examples of five questions presented at the 
New York demonstration. Each is followed hy a reproduction of the answers 
retrieved 'by su'bstructure search. In each case> the answers are composed 
of the Registry Number of the compound containing the substructure, its 
molecular formula. Preferred CA Name, bibliographic citations, and a graphic 
representation of the structure. 

The answer sheets are self-explanatory with the possible exception of 
the bibliographic citations. Therefore, a brief explanation follows as to 
their interpretation. 

Chemical Abstracts 

An example of a CA reference iss c63spl3271d. The ”c” designates 
that this is a CA reference, " 63 ” is the volume number, and ”p” signifies 
that the reference is an abstract of a patent. The omission of the 
indicates that the citation is other than a patent. The numerals that 
follow designate the column number. The concluding "d" in this exanple 
represents the portion of Page 13271 on which the abstract is located. 

SOCMA 

References from the Synthetic Organic Chemical Manufacturers Association 
handbook start with the abbreviation "socma" followed by a dash, followed "by 
the page number and a letter designating the position of the reference on 
the page (e.g., socma-657b). 
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Merck 

References from the Merck Index of Chemicals and Drugs start with the 
abbreviation merck followed by a dash^ followed by a page number and a 
letter designating the side of the two»column page in which the reference 
appears (e.g.^ merck-0777r). 

CZ 

References from Chemishes Zentralblatt start with the abbreviation 
CZ followed by a dash^ followed by the page number (e.g*^ cz-5655). 
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SUBSTRUCTURE SEARCH DEMONSTRATION 
Chemical Abstracts Seryiee 
REQUEST FORM 



Name:. 






Affiliation 



Substructure Request: 
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Form 65-162 8/31/66 



Satcb Number^ 
Question Number:. 



Date:. 



Chem:. 



Syst:. 



Delivery : 

□ Pickup □ Mail 






l^id^vvvS acce/o~f /ar^ 

AJo. o (- o t^^n>er\s 

d u/o !>e^tc///f0^ ft Ce.p/fq/o^^at'tn 

mo-no^rafil d?/7 yHLj-ec^f~. 






EXAMPLE 1 - ANSWER 1 



REGISTRY NO. =s 87,536 
CaHi 1 NOaS 

Preferred Name; 4»Thla»i-azablcyclo<3.8.o>heptane»8-carboxy lie 

acid, 3,3-dlmefhyl»7-oxo- 

socma«6S7b 
C68:4468e 
CZ-5655 
merck-0777r 
C83:pi 3871 d 




EXAMPLE 1 - ANSWER 2 



REGISTRY NO. =* i,80a,8S6 

CaHi sNaOaS.Na 

Preferred Name: 4 <^Thia-i-azablcyclo<a. 8 .o>heptane« 8 «carboxyl Ic acid 

e-ami no-3,3«dimefhy I -7-0X0, sodium salt 

C6a:i30i eg 

C63:i886lf 




.Na salt 



EXAMPLE 1 - ANSWER ^ 
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REGISTRY NO. = 1,623^640 

CioHiiNO 

Preferred Name: 2 *azetldlnone, s-mefhyl-s-phenyl- 

C 63 : p04860a 
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EXAMPLE 1 - AMSWER h 

REGISTRY NO. = 1,683^694 

CioHiiNO 

Preferred Name: s-azetidf none, i-me*thyl«3-phenyl- 

C 63 :po 48 eoa 
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EXAMPLE 1 - ANSWER 

REGISTRY NO. =s i^es3^7S9 
Cl 1 Hi 3 NO 

Preferred Name: a-Azetfdinone^ 1 -methyl ^-p-toly I - 

ca3:po486oa 




IXftMPLE 1 - ANSWER 6 

REGISTRY NO. * i^746^oei 

C 11 H 13 NO 

Preferred Name: e-AzeTIdlnone^ 1 #3~dtmethyl«3>pheny I— 

C83:po42eoa 
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SUBSTRUCTURE SEARCH DEMONSTRATION 
Chtmica! Abstroefs Serw'ce 
REQUEST FORM 
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Affi/iation .* 
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Bafeb Number:^ 
Question Number:. 



Date:. 



Cham:. 



Syst:. 



Delivery: 

□ Pickup □ Moit 



Substructure Request: 
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example 2 " MSWER 1 



REGISTRY NOe =s i^sse^sio 
CaiHaoMa04, 

Preferred Name: Coumarin, 3*^^4-(d!propylamlno)bufyl>carbamoy l>- 

4 -hydroxy-e-methy I - 

C03 : poaoeod 



CHa 



jCHa-CHa-CHa 



EXAMPLE 2 - ANSWER 2 

REGISTRY NO, = i^sae^sai 
03iH3oNa04.HCI 

Preferred Name: Coumarin, 3«4-fdlpropylamlno)butyl>carbamoyl>. 

4-hydroxy -8-methyl-, hydrochloride 
cas: poaoeod 




yCHa-^a-Ol3 

uMa-CHa*CH3 



HCI 
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EXAMPLE 2 « ANSWER 3 

REGISTRY NO. = i.iia.aei 
CsaHsoN.Br 

Preferred Name: Ammonium, tripropyttetradecyf-. bromide 

ces:o 826 Be 



ma-CHa-CHa 
CHa«^ CHa)“'*K>laH!|iKIHa“^^ 



ia 



uHa 



a-CHa-CHa 



Br" 
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EXAMPLE 2 - ANSWER h 

REGISTRY NO. = i.4as.387 

CaoH8sNa04 

Preferred Name: Coumarin. a-«4-(dIpropyIamIno)butyl>carbamoyI>-4-hydroxy- 

C6a:poo5aid 
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SUBSTRUCTURE SEARCH DEMONSTRATION 
Ch^mieaf Abstracts Sarrtea 
REQUEST FORM 



Nanie 









Affiliation 



Batch Number:, 
Questian Number:. 



Date:. 



Cbem:. 



Syst:. 



Delivery: 

□ Pickup □ htait 



Substructure Request: 





':(= noi" 

o / lj / on 















EXAMPLE 3 - MSWER 1 

REGISTRY NO, = i ,597,768 
Cs4Hsi FiOs 

Preferred Name: Preqna-i ,4-diene-3,eo-dtone, iea-f I uoro-iib,i 7 -df hydroxy -ea 

-methyl-, 17 -acetate 

CS7 ; p91 9b 




EXAMPLE 3 - ANSWER 2 

REGISTRY NO. =• i,683,iio 
Ce4Hsi FO 7 

Preferred Name; Proqna-i ,4-dfene-3,so-dfone, ea-f luoro-iib,i 6 a,i 7 ,ei -tetrah 

ydroxy-, cyclic i6,i7-(Et orthoformate) 

C5o;pao7og 
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EXAMPLE 3 " ANSWER 3 

REGISTRY NO, s i,73S,ai3 
CaaHsaPOe 

Preferred Name: Pregna-i «4-diene«a^ao-dfone^ el-f luoro-iib^ida^i 7 ^ai -tetrah 

ydroxy-^ cyclic ie^i7«acetal with cyclopentanone 

C6o:pao7og 






EXAMPLE 3 ■ AMSWER 4 

REGISTRY NO, s i ^eoo^aee 
CaaHaeF04 

Preferred Name: Pregna-i ^4-dlerie-e,ao-dione^ ea-f luoro-iib^i7-dlhydroxy-iea 

-methyl- 

ceo:paa8f 
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EXAMPLE 3 ~ ANSWER 3 

REGISTRY NO. = i ^84i , 23 a 
CasHssFOe 

Preferred Name: Pregna-i ^4-diene-3,eo-dione, ea-f luoro-iib,i 6 a^i 7 ,ei-tetrah 

ydroxy-, cyclic i«^i7>acetal wl+h acetophenone 

C 6 o:p 307 og 
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REQUEST FORM 



Atoms 
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Batch Numhmr: 

Question Number: 

Date: 

Chemi 

Sjtst: 

Delivery: 

□ Pickup □ Mail 



Substructure Request: 
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EXAMPLE - AlISWER 1 



REGISTRY NO. = 7«,175 

CsCIsFs 

Preferred Name: Propane^i ^e^a-trlchloropentaf luro- 



socma- 1 9sg 

C52:io4ed 

C62:i16Sb 




EXAMPLE k - ANSWER 2 



REGISTRY NO. = i,699,4ia 

CaClaFa 

Preferred Name: Propane, i , 2 , 2 -trlchloropentaf luro- 

C51 :pi245C 
C62:i1166b 

cea: 29922 
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EXAMPLE 4 « ANSWER 3 

REGISTRY NO. = 1^645^789 

C3H8BrCIP4 

Preferred Name; Propane^ i-bromo-i-chloro-a^a^a^a-tetraf luro- 

C60:pi3140h 
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EXAMPLE k - mSWR k 



REGISTRY NO. * i ^658^808 
CsClaPe 

Preferred Name: Propane^ s^a-dlchlorohexaf luoro- 



CS8:8879h 
C83:551 5C 




EXAMPLE k ■» MSWER 5 
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REGISTRY NO. = i,6sa,ei9 
CsCIsFs 

Preferred Name: Propane^ ^a-trlchloropentaf luoro- 



css: pi 4001 f 
c6o:pi3i4oe 

C6S:i 1165b 





SUBSTRUCTURE SEARCH DEMONSTRATION 
Chemical Abstracts Service 
REQUEST FORM 



Nome 

Affiliation .* 



Batch Numbmr: 

Questian Number: 

Date: 

Chem: 

Syst : 

Delivery: 

□ Pickup □ Mail 



Substructure Request: 



f~\_ ? 



Form 65- 162 
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EXAMPLE ^ - ANSWER 1 



REGISTRY NO, = ioi,aia 
CioHi2CINOs 



Preferred Name; Carbanilic acid, m-chloro-, isopropyl ester 



socma-ssae 

cee:i024e 

cee:ei88d 

C8S:p3337b 

C8e:458ob 

C88:S818f 

C83:4878C 

C8a:4880C 

C83:488sg 

C83:4883C 

C83:88S8f 



^8-fiH-C0-C>-CH-(CH3) 

r 
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EXAMPLE 5 - MSWER 2 



REGISTRY NO, = ioi,e8S 
CeHi 1 NOe 

Preferred Name: Carbanilic acid, ethyl ester 



socma-seo I 

C88:i9e5C 

CZ-8853 

cea:44eog 

cea:44eog 

ce3:7S88d 

C83 : 1 1 891 a 






^^MJH-C0-0-C2Hs 
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EXAMPLE 5 - ANSWER 3 



REGISTRY NO. = 138,489 

Cl oHi sNOs 

Preferred Name: Carbanilic acid, isopropyl ester 



socma-868 I 

C6S: 81908 
C6s:sss8h 
C68:SSSSC 
C68:SS34C 
C6s:ssi 9h 
cea:704i f 
C6S:8SS9f 
C6S : 688sd 
C6s:758ed 
C6a:e97sb 
C6a:ii89ia 
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^9-NH-C0-0-CH( CHa) 



EXAMPLE 5 - ANSWER ^ 

REGISTRY NO. = i,saa,74S 
Cl 1 Hi sN08 

Preferred Name: Carbanilic acid, butyi ester 



C6s:pi6i8ie 
C6S:07688d 
C6S : 00451 f 
C6S: 07588 d 



H-§. 



CeHs-NH-C-OBu 
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EXAMPLE 5 - ANSWER 5 

REGISTRY N0, — i ^542^489 
C 11 H 14 FNO 8 

Preferred Name; Carbanlllc acid, o-ethyl-,a-f luoroefhyl ester 

C52:369lh 
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EXAMPLE 5 - ANSWER 6 

REGISTRY NO. s 1 ,542,490 
C 11 H 14 FNO 2 

Preferred Name: Carbanlllc acid, N-ethyl-, 2 *f I uoroethy I ester 

CS3:i196b 
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